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ABSTRACT 


This  paper  presents  a theoretical  result  in  the  context  of  realizing  high-speed  hardware  for  parallel  CRC 
checksums.  Starting  from  the  serial  implementation  widely  reported  in  the  literature , we  have  identified  a 
recursive  formula  from  the  degree  of  the  polynomial  generator.  Last , we  from  which  our  parallel 
implementation  is  derived.  In  comparison  with  previous  works , the  new  scheme  is  faster  and  more  compact 
and  is  independent  of  the  technology  used  in  its  realization.  In  our  solution , the  number  of  bits  processed  in 
parallel  can  be  different  have  also  developed  high-level  parametric  codes  that  are  capable  of  generating  the 
circuits  autonomously  when  only  the  polynomial  is  given. 
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I.  Introduction 

Cyclic  Redundancy  Check  (CRC)  is  widely  used  in 
data  communications  and  storage  devices  as  a 
powerful  method  for  dealing  with  data  errors.  It  is 
also  applied  to  many  other  fields  such  as  the 
testing  of  integrated  circuits  and  the  detection  of 
logical  faults  [6].  One  of  the  more  established 
hardware  solutions  for  CRC  calculation  is  the 

Linear  Feedback  Shift  Register  (LFSR),  consisting 
of  a few  flip-flops  (FFs)  and  logic  gates.  This  simple 
architecture  processes  bits  serially.  In  some 

situations,  such  as  high-speed  data 
communications,  the  speed  of  this  serial 
implementation  is  absolutely  inadequate.  In  these 
cases,  a parallel  computation  of  the  CRC,  where 
successive  units  of  w bits  are  handled 
simultaneously,  is  necessary  or  desirable.  Like  any 
other  combinatorial  circuit,  parallel  CRC  hardware 
could  be  synthesized  with  only  two  levels  of  gates. 
This  is  defined  by  laws  governing  digital  logic. 
Unfortunately,  this  implies  a huge  number  of 
gates.  Furthermore,  the  minimization  of  the 
number  of  gates  is  an  NP-hard  optimization 
problem.  Therefore,  when  complex  circuits  must  be 
realized,  one  generally  uses  heuristics  or  seeks 


customized  solutions.  This  paper  presents  a 
customized,  elegant,  and  concise  formal  solution 
for  building  parallel  CRC  hardware.  The  new 
scheme  generalizes  and  improves  previous  works. 
By  making  use  of  some  mathematical  principles, 
we  will  derive  a recursive  formula  that  can  be  used 
to  deduce  the  parallel  CRC  circuits.  Furthermore, 
we  will  show  how  to  apply  this  formula  and  to 
generate  the  CRC  circuits  automatically.  As  in 
modern  synthesis  tools,  where  it  is  possible  to 
specify  the  number  of  inputs  of  an  adder  and 
automatically  generate  necessary  logic,  we 
developed  the  necessary  parametric  codes  to 
perform  the  same  tasks  with  parallel  CRC  circuits. 
The  compact  representation  proposed  in  the  new 
scheme  provides  the  possibility  of  significantly 
saving  hardware  and  reaching  higher  frequencies 
in  comparison  with  previous  works.  Finally,  in  our 
solution,  the  degree  of  the  polynomial  generator,  m, 
and  the  number  of  bits  processed  in  parallel,  w, 
can  be  different.  The  article  is  structured  as 
follows:  Section  2 illustrates  the  key  elements  of 
CRC.  In  Section  3,  we  summarize  previous  works 
on  parallel  CRCs  to  provide  appropriate 
background.  In  Section  4,  we  derive  our  logic 
equations  and  present  the  parallel  circuit.  In 
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addition,  we  illustrate  the  performance  by  some 
examples.  Finally,  in  Section  5,  we  evaluate  our 
results  by  comparing  them  with  those  presented  in 
previous  works.  The  codes  implemented  are 
included  in  the  Appendix. 

II.  Cyclic  Redundancy  Check 

As  already  stated  in  the  introduction,  CRC  is  one 
of  the  most  powerful  error-detecting  codes.  Briefly 
speaking  CRC  can  be  described  as  follows:  Let  us 
suppose  that  a transmitter,  T,  sends  a sequence, 
SI,  of  k bits,fbO;bl;;bkylg,  to  a receiver,  R.  At  the 
same  time,  T generates  another  sequence,  S2,  of  m 
bits,  fbO  0;b  0 l;;b  0 mylg,  to  allow  the  receiver  to 
detect  possible  errors.  The  sequence  S2  is 
commonly  known  as  a Frame  Check  Sequence 
(FCS).  It  is  generated  by  taking  into  account  the 
fact  that  the  complete  sequence,  S % SI  [S2, 
obtained  by  the  concatenating  of  SI  and  S2,  has 
the  property  that  it  is  divisible  (following  a 
particular  arithmetic)  by  some  predetermined 
sequence  P,  fpO;pl;;,  ofm^l  bits.  After  T sends  S to 
R.  R divides  S (i.e.,  the  message  and  the 
FCS)byP,usingthesameparticulararithmetic.Afterit 
receivesthe  message.  If  there  is  no  remainder,  R 
assumes  there  was  no  error.  Fig.  1 illustrates  how 
this  mechanism  works.  A modulo  2 arithmetic  is 
used  in  the  digital  realization  of  the  above  concepts 
[3]:  The  product  operator  is  accomplished  by  a 
bitwise  AND,  whereas  both  the  sum  and 
subtraction  are  accomplished  by  bitwise  XOR 
operators.  In  this  case,  a CRC  circuit  (modulo  2 
another  possible  implementation  of  the  CRC  circuit 
[7]  is  shown  in  Fig.  3.  In  this  paper,  we  will  call  it 
LFSR2.  In  this  circuit,  the  outputs  of  FFs  (after  k 
clock  periods)  are  the  same  FCS  computed  by 
LFSR.  It  should  be  mentioned  that,  when  LFSR2  is 
used,  no  sequence  of  m zeros  has  to  be  sent 
through  d.  So,  LFSR2  computes  FCS  faster  than 
LFSR.  In  practice,  the  message  length  is  usually 
much  greater  than  m,  so  LFSR2  and  LFSR  have 
similar  performancedivisor)  can  be  easily  realized 
as  a special  shift  register,  called  LFSR.  Fig.  2 shows 
a typical  architecture.  It  can  be  used  by  both  the 
transmitter  and  receiver. 

In  Fig.  2,  we  show  that  m FFs  have  common 
clock  and  clear  signals.  The  input  xO  i of  the  ith  FF 
is  obtained  by  taking  an  XOR  of  the  diylbth  FF 
output  and  a term  given  by  the  logical  AND 
between  pi  and  xmyl.  The  signal  xO  0 is  obtained 
by  taking  an  XOR  of  the  input  d and  xmyl.  Ifpi  is 
zero,  only  a shift  operation  is  performed  (i.e.,  XOR 
related  to  xO  i is  not  required);  otherwise,  the 
feedback  xmyl  is  XOR-ed  with  xiyl.  We  point  out 


that  the  AND  gates  in  Fig.  2 are  unnecessary  if  the 
divisor  P is  time -invariant.  The  sequence  SI  is  sent 
serially  to  the  input  d of  the  circuit  starting  from 
the  most  significant  bit,  bO.  Let  us  suppose  that  the 
k bits  of  the  sequence  SI  are  an  integral  multiple  of 
m,  the  degree  of  the  divisor  P.  The  process  begins 
by  clearing  all  FFs.  Then,  all  k bits  are  sent,  once 
per  clock  cycle.  Finally,  m zero  bits  are  sent 
through  d.  In  the  end,  the  FCS  appears  at  the 
output  end  of  the  FFs. 

III.  Previous  Methods 

Basically,  the  hardware  implementation  for  the 
CRC  in  series  computation  is  LFSR  (Linear 
Feedback  Shift  Register).  It  consists  of  FFs  and 
logic  gates 

Linear  Feedback  Shift  Register- 1 


FCS 


Divisor 


Simulation  and  synthesis  result  of  LFSR- 1 
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Linear  Feedback  Shift  Register-2 


Algorithm  for  F-  matrix: 


FCS 


Divisor 


Serial  sequence 


Simulation  and  synthesis  result  of  LFSR-2 


Drawbacks  of  series  CRC  computation: 


Implementation  of  series  computation  is 
inadequate.  It  takes  more  number  of  clock  cycles  to 
detect  the  errors. 

IV.  Proposed  Method 

PARALLEL  CRC  COMPUTATION 

There  are  different  techniques  for  parallel  CRC 
generation  given  as  follow.  1.  A Table-Based 
Algorithm  for  Pipelined  CRC  Calculation.  2.  Fast 
CRC  Update  3.  F matrix  based  parallel  CRC 
generation.  4.  Unfolding,  Retiming  and  pipelining 
Algorithm 

Parallel  CRC  is  mainly  depends  on  the  F-  matrix 
generation.  Basic  f-matrix  for  the  given  error 
polynomial 

i.e.,Ml  101 101 101 1 100010000011001000001”. 

Derivation  of  the  formula  for  matrix  Fw 

pi  = [Fi~1[pm_1  ... . p2PiPo]  |the  first  m 
- 1 coloumns  of  Fl_1  ] 


Ifw<m  then  the  inputs  [dm_1 .... dw ] are  not  needed. 
Inputs^.! .... d0 ] are  the  bits  of  dividend  sent  in 
groups  of  V’  bits  each. 

. Where  As  realization  of  the  LFSR2  it  has  circuit 
very  similar  to  the  above  inputs  dare  XORed  with 
the  FFs  outputs  and  results  are  feedbacks 
Flowchart  for  F-matrix: 


CRC  -32  hardware  for  32  bits  processing. 


Parallel 

sequence 


# 4 


m-lja-l 

Enables 


Simulation  results  of  the  CRC-32  for  32  bit  parallel 
processing: 
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Block  diagram  of  64-bit  parallel  calculation  of 
CRC-32 


Simulation  results  of  the  CRC-32  for  64  bit  parallel 
processing: 
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V.  Conclusion  and  Future  Scope 

Although  this  project  primarily  deals  with  the 
time  (number  of  clock  cycles  to  generate  the  FCS  ) 
considerations 

So  for  high  speed  applications  64  bit  process  of 
CRC-32  is  most  required.  It  takes  less  time  to 
generate  the  CRC  for  error  detection.  Here  area  can 
be  reduced  by  replacing  AND  gates  with  NAND 
gates  and  XOR  with  XNOR  gates.  By  replacing  AND 
gate  with  NAND  gate  2 CMOS  transistor  are  less 
required.  So  it  is  the  area  efficient  architecture. 
CRC-32  used  in  Ethernet  frame  for  error 
detectionand  CRC-CCITT  which  is  32  bit 
polynomial  used  in  X-25  protocol,  disc  storage, 
SDLC,  and  XMODEM 
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Comparison  of  all  CRC  computational  models: 


Type  of  CRC 
computation 

Number  of 
clock  cycles 
used 

Time  period 
in  nano 

seconds 

LFSR-l 

96 

196.01ns 

LFSR-2 

64 

131.61ns 
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